Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
1.
Bioinformatics Research and Applications, Isbra 2022 ; 13760:369-380, 2022.
Article in English | Web of Science | ID: covidwho-2309148

ABSTRACT

Clustering viral sequences allows us to characterize the composition and structure of intrahost and interhost viral populations, which play a crucial role in disease progression and epidemic spread. In this paper we propose and validate a new entropy based method for clustering aligned viral sequences considered as categorical data. The method finds a homogeneous clustering by minimizing information entropy rather than distance between sequences in the same cluster. We have applied our entropy based clustering method to SARS-CoV-2 viral sequencing data. We report the information content extracted from the sequences by entropy based clustering. Our method converges to similar minimum-entropy clusterings across different runs and limited permutations of data. We also show that a parallelized version of our tool is scalable to very large SARS-CoV-2 datasets.

2.
18th International Symposium on Bioinformatics Research and Applications, ISBRA 2022 ; 13760 LNBI:369-380, 2022.
Article in English | Scopus | ID: covidwho-2265112

ABSTRACT

Clustering viral sequences allows us to characterize the composition and structure of intrahost and interhost viral populations, which play a crucial role in disease progression and epidemic spread. In this paper we propose and validate a new entropy based method for clustering aligned viral sequences considered as categorical data. The method finds a homogeneous clustering by minimizing information entropy rather than distance between sequences in the same cluster. We have applied our entropy based clustering method to SARS-CoV-2 viral sequencing data. We report the information content extracted from the sequences by entropy based clustering. Our method converges to similar minimum-entropy clusterings across different runs and limited permutations of data. We also show that a parallelized version of our tool is scalable to very large SARS-CoV-2 datasets. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.

3.
Computational Advances in Bio and Medical Sciences ; 12686:127-141, 2021.
Article in English | Web of Science | ID: covidwho-2003651

ABSTRACT

With the availability of more than half a million SARS-CoV-2 sequences and counting, many approaches have recently appeared which aim to leverage this information towards understanding the genomic diversity and dynamics of this virus. Early approaches involved building transmission networks or phylogenetic trees, the latter for which scalability becomes more of an issue with each day, due to its high computational complexity. In this work, we propose an alternative approach based on clustering sequences to identify novel subtypes of SARS-CoV-2 using methods designed for haplotyping intra-host viral populations. We assess this approach using cluster entropy, a notion which very naturally captures the underlying process of viral mutation-the first time entropy was used in this context. Using our approach, we were able to identify the well-known B.1.1.7 subtype from the sequences of the EMBL-EBI (UK) database, and also show that the associated cluster is consistent with a measure of fitness. This demonstrates that our approach as a viable and scalable alternative to unveiling trends in the spread of SARS-CoV-2.

4.
17th International Symposium on Bioinformatics Research and Applications, ISBRA 2021 ; 13064 LNBI:165-175, 2021.
Article in English | Scopus | ID: covidwho-1565307

ABSTRACT

The unprecedented level of genome sequencing during the SARS-CoV-2 pandemic brought about the challenge of processing this genomic data. However, the state-of-the-art phylogenetic methods were mostly designed for analyzing data that are significantly sparser and require extensive subsampling of strains. We present (ε, τ) -MSN, a novel tool that reconstructs a viral genetic relatedness network based on genetic distances, that can process hundreds of thousands of sequences in under several hours. We applied (ε, τ) -MSN to the global COVID-19 outbreak data and were able to build a genetic network on more than 100,000 SARS-CoV-2 sequences. We show that (ε, τ) -MSN can accurately detect transmission events and build a genetic network with significantly higher assortativity with respect to continent and country attributes of SARS-CoV-2 samples. The source code for this software suite is available at https://github.com/Sergey-Knyazev/eMST. © 2021, Springer Nature Switzerland AG.

SELECTION OF CITATIONS
SEARCH DETAIL